Applying supervised and unsupervised learning methods to discover regularities in protein sequences and structures

نویسنده

  • Joachim Selbig
چکیده

The gap between the number of known protein sequences and the number of known protein structures is rapidly increasing. Computer-based protein structure prediction aims at reducing this sequence-structure gap. Establishing connections between sequence similarity and structural similarity is a prerequisite in the development of protein structure prediction methods. The automated discovery of these connections is an excellent field for applying machine learning techniques. The paper presents an approach to identify long range relationships between protein sequence patterns and structural motifs by varying the granulation of the structure description. The accuracy of predicting hexameric constituency in structural classes from sequence information provided the basis for evaluating and optimizing various classification and pattern recognition schemes. About 10 structural states seem to be the optimum to take into account subtleties of the protein structure on the one hand and to allow the discrimination of its sequential equivalents on the other hand.

منابع مشابه

Stages in the Process of Scientific Discovery

The first step in using a discovery technique involves problem formulation, that is, stating some task in terms that can be addressed by a well-defined method. For example, to discover regularities in the area of molecular biology, one might reformulate the problem as supervised induction, as unsupervised clustering, as grammar acquisition, or as theory revision. Successful application of disco...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Experiments with Learning Techniques for Spatial Model Enrichment and Line Generalization

The nature of map generalization may be non-uniform along the length of an individual line, requiring the application of methods that adapt to the local geometry and the geographical context. Geographical databases need to be enriched in terms of shape description structures (geometrical knowledge), knowledge of appropriate order of operations and of appropriate algorithms (procedural knowledge...

متن کامل

Simultaneous Class Discovery and Classification of Microarray Data Using Spectral Analysis

Classification methods are commonly divided into two categories: unsupervised and supervised. Unsupervised methods have the ability to discover new classes by grouping data into clusters or tree structures without using the class labels, but they carry the risk of producing noninterpretable results. On the other hand, supervised methods always find decision rules that discriminate samples with ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007